In silico prediction of the enzymes involved in the degradation of the herbicide molinate by Gulosibacter molinativorax ON4T

Gulosibacter molinativorax ON4T is the only known organism to produce molinate hydrolase (MolA), which catalyses the breakdown of the thiocarbamate herbicide into azepane-1-carboxylic acid (ACA) and ethanethiol. A combined genomic and transcriptomic strategy was used to fully characterize the strain ON4T genome, particularly the molA genetic environment, to identify the potential genes encoding ACA degradation enzymes. Genomic data revealed that molA is the only catabolic gene of a novel composite transposon (Tn6311), located in a novel low copy number plasmid (pARLON1) harbouring a putative T4SS of the class FATA. pARLON1 had an ANI value of 88.2% with contig 18 from Agrococcus casei LMG 22410T draft genome. Such results suggest that pARLON1 is related to genomic elements of other Actinobacteria, although Tn6311 was observed only in strain ON4T. Furthermore, genomic and transcriptomic data demonstrated that the genes involved in ACA degradation are chromosomal. Based on their overexpression when growing in the presence of molinate, the enzymes potentially involved in the heterocyclic ring breakdown were predicted. Among these, the activity of a protein related to caprolactone hydrolase was demonstrated using heterologous expression. However, further studies are needed to confirm the role of the other putative enzymes.


Results and discussion
Strain ON4 T genome sequencing. The 454 pyrosequencing of the G. molinativorax ON4 T genome (ON4 T 454 ) generated 305,068 reads, comprising 133.7 million nucleotides (nt), which were assembled in 163 contigs (PXVE00000000). Interested in disclosing the genes in the vicinity of molA, analyses started with a BLAST search of the molA gene sequence against the 163 assembled contigs. The complete molA gene sequence (1398 bp) was located in a single contig (1957 bp) but revealed no other open reading frame (ORF) in its vicinity. The strain ON4 T genome was resequenced using PacBio (ON4 T PacBio ), which generated 102,855 polymerase reads, resulting in 184,306 subreads with a mean subread length of 7254 bp, and 1,337 billion nt. These reads were de novo assembled and two contigs of 3,465,658 bp (linear) (contig A-CP028426) and 37,013 bp (circular) (contig B-CP028427) were generated. The assembled genome had 98.6% completeness, 0.6% contamination (Table S1) and 64% average G+C content for each contig. This in silico G+C content value agrees with previously published data 7 . The ON4 T PacBio genome assembly disclosed 3243 predicted protein-coding genes, most of them assigned to putative functions (75.9%) and matching 94.3% of the predicted protein-coding genes in the assembly of strain DSM13485 T (JGI, AUDX00000000.1) ( Table 1, Fig. 1, Table S1). Furthermore, forty-nine tRNAs, three 16S rRNA, three 23S rRNA, and three 5S rRNA genes were annotated ( Table 1). The occurrence of three 16S rRNA gene copies was further confirmed by determining the ratio between the 16S rRNA and the recA genes (single-copy gene) in both strains ON4 T and ON4 (−) ( Table S2).
As expected, the data retrieved using both sequencing technologies (454 and PacBio) were very similar (Fig. 1). Nevertheless, the long reads obtained by PacBio sequencing allowed to sum up the 163 assembled ON4 T 454 contigs into two contigs only. In addition, the comparison of the two draft genomes allowed closing the contig A (Fig. S1) and identifying the molA gene in the contig B. molA is the passenger gene of Tn6311. The molA gene, located in contig B, was surrounded by two identical insertion sequences (IS) of the IS3 group and family (Fig. 2a), whose original transposase belongs to the DDE transposase family 26 . The primary sequence of this IS was not identical to any IS described so far and was named ISGmo1 (https:// www-is. bioto ul. fr/). This structure, where a passenger gene (molA), which does not code for a transposition enzyme, is flanked by two IS (ISGmo1), is typical of a composite transposon 27,28 . This organization was confirmed by Sanger sequencing (data not shown). The novel composite transposon, named Tn6311, contained three copies of molA, each flanked upstream and downstream by ISGmo1 (a total of 4 ISGmo1 copies) (Fig. 2a). This tandem organization was primarily observed in specific PacBio raw reads (data not shown). Further, the three molA copies were confirmed in strain ON4 T using quantitative PCR, as indicated by the gene copy ratio between molA and a neighbour single-copy gene (parA) ( Table S2).
As described for other IS3-like sequences, ISGmo1 has a total length of 1287 bp and contains two partially overlapping open reading frames, orfA (291 bp) and orfB (1080 bp), which encode the DNA-binding and the catalytic domain of the transposase, respectively 27,[29][30][31] . Some IS3 family members might assemble different functional protein domains from one DNA segment, using a mechanism known as recoding programmed ribosomal frameshifting. This mechanism involves a ribosomal slippage of 1 nt, occurring at the heptanucleotide sequence A AAA AAG position, between orfA and the downstream fragment, orfB 26,31 . The slippage generates a −1 frameshift resulting in a fused product of the upstream frame (DNA-binding domain) and the downstream  S2). Although DDE transposases have a highly conserved catalytic structure, they show overall low sequence conservation 33 , which may explain the low identities found between the ISGmo1 transposase and those found in the ISfinder database (https:// www-is. bioto ul. fr/). Despite the low identities, ISGmo1 clusters with IS sequences of other Actinobacteria, suggesting a phylogenetic relationship (Fig. S2). In Tn6311, the inverted repeats on the left side (IRL) includes the start codon of the orfA/orfAB ISGmo1 genes and also two putative transposase promoters according to in silico identification (http:// genom e2d. molge nrug. nl/) (Fig. 2a). In contrast, the repeats on the right side (IRR) of ISGmo1, just upstream the molA gene, includes the last three codons of orfB and the − 35 region of the molA promoter. The structure of the molA gene regulatory region, i.e., the sequence of the regions − 35, − 10, and the ribosome binding sequence and their relative position towards the molA start codon (Fig. S3), suggests a strong promoter 15,34 . The existence of such a strong promoter combined with the tandem organization of molA gene in Tn6311 explains the high rate of molinate breakdown by strain ON4 T , suggesting a high gene expression response.
Interestingly, the ISGmo1 sequence was also found in contig A (ON4 T PacBio ), but without any passenger gene (Fig. 2b) and with different direct repeats sequences. Evidence of such a copy of ISGmo1 was also given by the ON4 T 454, DSM13485 T (JGI, AUDX00000000.1), and ON4 (−) sequenced genomes and confirmed by primer design and Sanger sequencing (data not shown).
Characterization of contig B: pARLON1. Contigs A and B were presumed, based on size, to be a chromosome and a plasmid, respectively. The detection of a band with approximately 30 kbp by Pulsed-Field Gel Electrophoresis (PFGE) (Fig. S4) supported this hypothesis. Moreover, no reads mapping to contig B were found in the draft genomes of strains DSM13485 T (JGI, AUDX00000000.1) and ON4 (−) (Fig. 1), an ON4 T variant unable to degrade molinate (Fig. S5). Altogether these data corroborate the existence of a mobile element enabling the degradation of molinate, i.e., a plasmid carrying the molA gene, which, most probably, was cured in strain ON4 (−) after 12 generations in culture medium without the herbicide. In turn, the absence of reads mapping to contig B in strain DSM13485 T (JGI, AUDX00000000.1) suggests that the conditions used to grow the culture collection strain did not favour the maintenance of the plasmid that is quite unstable. Hence, contig B was named pARLON1.
Plasmids are described as having modular structures, where each "module" is dedicated to a particular function (replication, stability, conjugation, establishment and adaption) 35 . pARLON1 contained 47 identified ORFs from which 36 ORFs (~ 27 kbp) belong to the plasmid backbone, and 11 ORFs belong to the adaptive module (Tn6311, ~ 10 kbp) (Fig. 3). pARLON1 did not share significant nucleotide identity with other described plasmids  (Table S3). LMG 22410 T is the type strain of Agrococcus casei, also a member of the family Microbacteriaceae, with a G+C content (65 mol%) closer to that of G. molinativorax (64 mol%) than to other Agrococcus species 7,38 . The pARLON1 genes sharing > 70% identity with genes of the A. casei contig were those belonging to the plasmid backbone (36 predicted genes), representing 46.3 and 53.8% of the full coverage of contig B and contig FUHU01000018.1, respectively. In contrast, the predicted genes belonging to the adaptive module of each of these genetic elements were different (Fig. 3, Table S4). Some ORFs of both pARLON1 and contig FUHU01000018.1 are related to the functional modules of transmissible plasmids, according to the annotation of pARLON1 and the in silico predictions performed by CONJscan 40 and by oriTDB 41 . These analyses predicted the existence of two essential proteins of the single-strand DNA conjugation machinery, the relaxase (MOB) and the type IV coupling protein (T4CP). The former is a critical component of the relaxosome responsible for initiating the DNA transfer in both conjugative and mobilizable plasmids, and the last couples the relaxase-DNA nucleoprotein complex to the type IV secretion system (T4SS) 42 . A T4SS is a large complex of proteins spanning from the cytoplasm to the cell exterior constituting the mating-pair formation (MPF) module 43,44 . ORF44 and ORF12 from strains ON4 T and A. casei, respectively, showed homology with MOB proteins, whereas ORF46/trsK and ORF10 were related to the TraG/TraD family protein whose members are homologs to T4CP 45,46 (Fig. 3, Table S4). However, only CONJscan predicted the existence of an ORF related to VirB4 ATPase (ORF4 and ORF7 from strains ON4 T and A. casei, respectively), a marker of a T4SS presence 43,44 (Fig. 3, Table S4). Hence, according to CONJscan, both ON4 T and A. casei harbour a putative T4SS of the class FATA 40 , which is a MPF family specific to Firmicutes, Actinobacteria, Tenericutes and Archaea 43,44 , agreeing with the taxonomic affiliation of both ON4 T and A. casei. Because members of these microbial groups lack an outer membrane, some of the core complex T4SS components (e.g., VirB7/10) of Agrococcus tumefaciens are not identified in all the well-characterized conjugation systems of MPF FATA type (e.g., pCF10 and pGO1) 44 . However, proteins homologous to VirB3, VirB6 and VirB8, proteins of the inner-membrane pore, www.nature.com/scientificreports/ are common within this system type 44 . Among these MPF proteins, CONJscan predicted the existence of PrgH in ON4 T (ORF2) and A. casei (ORF8), which show homology with VirB6 44 . Also, PrgI, which is suggested to have homology with VirB3 44 , was predicted in ON4 T (ORF3). In addition, CONJscan predicted the presence of two other ORFs (1 and 45, and 9 and 11 from strains ON4 T and A. casei, respectively) related to accessory genes detected at frequencies higher than 50% within the MPF FATA family (FATA_prgF and FATA_cd411). Although these predictions support the existence of a MPF FATA type, the presence of a functional MPF module in pARLON1 needs to be experimentally validated.
In addition, the pARLON1 backbone seems to carry a stability module. This module is particularly important in low copy number plasmids, where mechanisms, such as the partition (Par) systems, are required to ensure the even partitioning of the DNA molecules among progeny (vertical transmission) 35,47,48 . ORF22 and ORF28 from strains ON4 T and A. casei, respectively, were predicted to be related to the plasmid partition motor protein ParA. However, no homologous to other proteins of this system (e.g., ParB and ParS) were predicted. These results  www.nature.com/scientificreports/ may indicate that the Par system is incomplete/not functional (which could explain the existence of the cured strains DSM13485 T and ON4 (−) ), or that the existing DNA binding proteins have low homology to known Par proteins (lower homology was found for proteins of the ParB group than for group ParA, difficulting the classification of the former 49 ), or that another maintenance mechanism is present. At the population level, conjugative plasmids are described as low copy number, while mobilizable plasmids tend to be high copy number 35,50 . The ratio between the plasmid (parA) and chromosomal (recA) single-copy genes in strain ON4 T was 3:1 (Table S2), suggesting that pARLON1 is a low copy number plasmid 51 . The presence of ORFs potentially associated with the maintenance and conjugation modules, together with the plasmid size (> 30 kbp) and its low copy number, suggest that pARLON1 can be a conjugative plasmid 35 . However, further studies are necessary to validate its conjugative activity. As referred to above, the adaptive module of pARLON1 differed from that of contig 18 from strain A. casei LMG 22410 T . However, the similarity of the plasmid backbone and the identification of a putative MPF FATA in strains ON4 T and A. casei LMG 22410 T strengthens the existence of a common actinobacterial ancestor that, depending on the environment, acquired different adaptive modules. Moreover, plasmid evolution might have been mediated via the activity of the insertion sequences flanking the adaptive module, IS6100 in A. casei LMG 22410 T and ISGmo1 in Tn6311 of G. molinativorax ON4 T .
Identification of candidate genes involved in ACA mineralization. Given the absence of any other catabolic gene in the vicinity of molA, a transcriptomic approach was used to identify the potential genes involved in ACA mineralization. The gene expression profiles of strain ON4 T growing in molinate (mineral medium with the herbicide as the single source of energy, carbon and nitrogen, herein referred to as ON4 T MMM ) and without molinate (control medium LB, herein referred to as ON4 T LB ) were compared to identify the genes overexpressed in ON4 T MMM . After quality control, 14,380,135 and 15,330,180 reads in each library (ON4 T MMM and ON4 T LB , respectively) were retrieved. Analysis of the reads revealed that subtractive hybridization only removed a small amount of the ribosomal RNA, since 86-90% of reads mapped to rRNA. Consequently, approximately 1.4 and 1.0 million mRNA sequence reads were obtained from ON4 T MMM and ON4 T LB , respectively. After mapping the mRNA reads against the ON4 T genome sequence, 95% of the 3243 expected transcripts were expressed under the tested growth conditions (ON4 T MMM and/or ON4 T LB ) (Table S5). From the 3095 expressed transcripts, 2726 were expressed under both growth conditions and 335 were overexpressed in ON4 T MMM compared to ON4 T LB (Tables S6 and S7) . Of these, 149 transcripts were exclusive in ON4 T MMM , i.e., no reads were detected in ON4 T LB . RT-qPCR validated expression results in eight overexpressed genes in ON4 T MMM . The trend of expression level was similar to that of the transcriptomics experiment (Table 2).
Metabolic pathways for Gulosibacter sp. are not available in KEGG. Nevertheless, the predicted gene products from the 335 overexpressed transcripts in ON4 T MMM were mapped to the KEGG database 52 . Some of the overexpressed predicted gene products in ON4 T MMM mapped to KEGG Orthology (KO) categories (Fig. S6), namely to metabolism (e.g., amino acid, carbohydrates), environmental information processing (e.g., membrane transport) and Brite hierarchies (e.g., signaling and cellular processes), respectively. However, no complete KEGG pathway appeared to be overexpressed in ON4 T MMM , in particular, none that could be related to ACA mineralization.  www.nature.com/scientificreports/ Most of the predicted gene products mapped to more than one metabolic pathway and/or different predicted gene products mapped to the same metabolic function. Thereby, genes potentially involved in ACA mineralization were identified based on homology to previously described enzymes involved in the degradation of other xenobiotics 53-56 and on the structural similarity of the putative substrates (Fig. 4). Potential candidates involved in the breakdown of the heterocyclic ring containing metabolites are cytochromes P450 (CYP), which act as monooxygenases and are among the most versatile biological catalyst 57 , being, among others, involved in the degradation of different xenobiotics 57,58 . In typical class I P450 systems, the electrons required for oxygen activation are indirectly provided by NAD(P)H. The co-enzyme generally reduces a ferredoxin reductase (FDR), which in turn transfers the electrons to the second component of the system, usually a ferredoxin 57 . Finally, the ferredoxin mediates the electron transport between the FDR and the CYP 57 , further inducing the incorporation of one oxygen atom into the substrate via the reductive cleavage of oxygen.
Strain ON4 T harbours four ORFs with homology to cytochromes P450, namely bioI, cyp104, pipA and cpxP (Fig. 5, Table S8). Among these, bioI and its contiguous ferredoxin reductase (GMOLON4_437 and GMO-LON4_436, respectively) were highly expressed in ON4 T MMM (Log2FC > 6, Table S8). Also, cyp104 (GMO-LON4_1038) and cpxP (GMOLON4_3097) were overexpressed in ON4 T MMM (Log2FC > 3, Table S8), whereas pipA and its contiguous ferredoxin reductase (GMOLON4_3051 and GMOLON4_3050, respectively) were not significantly overexpressed in ON4 T MMM (Log2FC < 2, Table S8). Noticeably, the cytochromes P450 bioI and pipA of strain ON4 T have an amino acid identity of 72% and have a similar genetic neighbourhood (Fig. 5). Indeed, the upstream and downstream enzymes gene sequences seem to be similar, particularly the contiguous ferredoxin www.nature.com/scientificreports/ reductase GMOLON4_436 and GMOLON4_3050, respectively, which have an amino acid identity of 48% with each other. Also, the proteins potentially involved in the transfer of the electrons from the ferredoxin reductases to the cytochromes P450 bioI and pipA (GMOLON4_435 and GMOLON4_3049, respectively) share an amino acid identity of 69%. In contrast, these cytochromes (bioI and pipA) share less than 26% and 30% amino acid identity with the cytochromes cyp104 and cpxP, respectively, which are also distinct from each other (24% amino acid identity) (Fig. 5). Curiously, the cytochromes P450 bioI and pipA of strain ON4 T have an amino acid identity of approximately 53 and 52% with that of Mycobacterium smegmatis mc 2 155 (pipA, AAD28344.1) and Mycobacterium sp. HE5 (morA, AAV54064.1), respectively (Table S8). Both are shown to be involved in the hydroxylation of N-heterocycles 53,59 . M. smegmatis pipA was shown to be induced when strain mc 2 155 grew in piperidine as the sole source of energy, carbon, and nitrogen 53 . Mycobacterium sp. HE5 recombinant protein morA in combination with the adjacent pair ferredoxin reductase/ferredoxin was able to oxidize morpholine 59 . Although the contiguous bioI and pipA ferredoxin reductase (GMOLON4_436 and GMOLON4_3050, respectively) were distinct from those found in Mycobacterium strains (Fig. S7), the inhibition of ACA mineralization under anaerobic conditions 8 , combined with the similarity between the molecular structure of piperidine/morpholine (six/five membered N-heterocycles), and the predicted seven membered N-heterocycle product from ACA degradation in strain ON4 T (hexamethyleneimine) 8 (Fig. S7), supports the involvement of one of these systems in the hydroxylation of hexamethyleneimine to hydroxyhexamethyleneimine in strain ON4 T8 (step I, Fig. 4). However, further experimental validation is necessary.
Considering the further step of the putative degradation of molinate, the cleavage of the cyclic metabolite (caprolactam) into 6-aminohexanoic acid (step III, Fig. 4), two possibilities arose from the transcriptomics analysis. These were the gene products of hyuA, GMOLON4_3203, and hyuB, GMOLON4_3204 (hydantoin utilization protein A and hydantoinase B, respectively), and the gene product of chnC (GMOLON4_1493, caprolactone hydrolase). All were overexpressed in ON4 T MMM (Log2FC > 2, Table S6). GMOLON4_3203 and GMO-LON4_3204 showed the highest amino acid identity (72-90%) with proteins of the Hydantoinase-Hydantoinase B/Oxoprolinase family found in different members of the phylum Actinobacteria. Whereas GMOLON4_1493  Caprolactam is the raw material for manufacturing Nylon-6 by ring opening polymerization. Attempts to isolate organisms capable of attenuating the environmental pollution caused by caprolactam intensive production and utilization have been made. Up to now, few Gram negative and Gram positive bacterial strains, including Gulosibacter sp. BS4, were described as capable of caprolactam degradation 56,68-70 . Among these organisms, only the enzyme acting upon the cleavage of caprolactam in Pseudomonas jessenii GO3 was described 56,71 . This caprolactamase, which consists of two subunits with high sequence identity to 5-oxoprolinases, hydrolyzes caprolactam into 6-aminohexanoic acid in an ATP-dependent manner 56,71 . Such information suggested the possible involvement of the GMOLON4_3203 and GMOLON4_3204 gene products in the breakdown of caprolactam by strain ON4 T . However, heterologous expression of both potential candidates demonstrated that only the chnC gene product (GMOLON4_1493) degraded caprolactam in resting cells assays (Fig. 6). Proteins affiliated to this family are involved in the cyclic ε-caprolactone transformation into a linear organic acid, formed during the oxidation of cyclohexanol or cyclohexanone by both Gram staining positive and negative bacteria 54,55,66,67 . Hence, as far as we know, this is the first report of their involvement in caprolactam degradation.
Also, the overexpression of transcripts related to the metabolism of N-compounds, particularly glutamine synthetase GMOLON4_438/3052 ( Fig. 4 and Fig. S6, Table S8), agrees with the ability of strain ON4 T to grow on molinate as the single source of nitrogen 8 . The importance of glutamine synthetase and glutamate synthase cyclic mechanism in ammonium assimilation in prokaryotes is well known. In particular, Harper and collaborators 73 reported that either glutamate dehydrogenase or glutamine synthase are very responsive and regulated under high or low nitrogen availability in M. smegmatis.
Among the proteins potentially involved in molinate degradation, only those related with the cytochromes P450 BioI and PipA seem to be found in some strains of the genus Gulosibacter (amino acid identity of up to 72% www.nature.com/scientificreports/ and 91% respectively, Table S9). These results suggest that some Gulosibacter strains, such as G. chungangensis KCTC 13959 T , may have the ability to partially oxidize N-heterocycle compounds. Interestingly, the products of the genes putatively involved on hexamethyleneimine degradation in strain ON4 T showed relatively high amino acid identity with proteins involved in the oxidation of 6 membered N-heterocycle compounds (piperidine) and cycloalkane alcohols (cyclopentanol/cyclohexanol) by a wide diversity of strains, namely actinobacteria (Table S8). All these results suggest that small alterations on the structure of these proteins may have allowed the utilization of caprolactam in strains ON4 T and Gulosibacter sp. BS4 and ACA in strain ON4 T as energy, carbon, and nitrogen sources.

Conclusions
In summary, the present study, allowed the in silico identification and characterization of a novel transposon (Tn6311), which harbours molA, the enzyme responsible for the breakdown of the molinate to ACA and confers the unique degrading ability of G. molinativorax ON4 T . In addition, a novel transmissible plasmid (pARLON1), which is present in strain ON4 T and Agrococcus casei LMG 22410 T , was characterized in silico. This last finding is of particular importance as the characterization of actinobacterial transmissible plasmids (mobilizable or conjugative) is scarce in the literature, compared to other bacterial phyla, particularly Proteobacteria. The future characterization of the plasmid transmission pathway will enlighten the underrepresented type IV secretion system (T4SS) in the FATA group, which includes only 3 representatives 44 . Such knowledge would enrich databases such as CONJscan 40 and oriTDB 41 , contributing to better identifying this system, particularly in genomic mining studies.
Additionally, this study identified seven candidate genes involved in ACA mineralization, and validated the activity of the GMOLON4_1493 product, related to the caprolactone hydrolase chnC, in the degradation of caprolactam. The future characterization of these enzymes will contribute to enlightening the molinate degradation pathway in strain ON4 T and strengthening their biocatalytic potential.

Material and methods
Culture conditions. Gulosibacter molinativorax ON4 T and the variant strain ON4 (−) were preserved in 15% (v/v) glycerol suspensions at − 80°C 7 . Whenever necessary, stored cells were recovered in Luria-Bertani Agar supplemented with 1 mM molinate (LAM). Cultures were incubated at 30 °C for 72 h. For 454 pyrosequencing, strain ON4 T was grown on LAM, and for PacBio sequencing in mineral medium B 9 supplemented with 0.2 g/L yeast extract and 1.5 mM molinate (MMM). For Illumina sequencing, strain ON4 (−) was grown in LAM. For a detailed description see the Supplementary Material.

Genome analysis (sequencing, assembling and annotation). 454 pyrosequencing and PacBio tech-
nologies were used to sequence the genome of strain ON4 T . For 454 pyrosequencing, five hundred nanograms of high-quality genomic DNA were fragmented by nebulization and the sequencing adapters ligated to create double-stranded DNA libraries. The 454 GS FLX sequencing platform (Roche) at Genoinseq (Cantanhede, Portugal) was used following the standard manufacturer's protocols (see Supplementary Material). Sequencing reads were quality trimmed and de novo assembled by GS Assembler, version 2.9 (Roche) using default parameters.
Genome resequencing was performed using SMRT technology PacBio RSII on the RS sequencer at GATC Biotech AG (Constance, Germany), according to the manufacturer's instructions 74 . Genome assembly was performed on the SMRT analysis software (http:// www. pacb. com/ produ cts-and-servi ces/ analy tical-softw are/ smrt-analy sis/). The Hierarchical Genome Assembly Process (HGAP, RS HGAP Assembly.3), which includes pre-assembly, de novo assembly with PacBio's AssembleUnitig and assembly polishing with Quiver, was used with the default parameters, except for the minimum polymerase read quality, where 0.80 was used instead of 0.75. Gene prediction was carried out with GeneMarkS 75 using specific parameters for prokaryotes (genetic code 11) and functional annotation was performed using the Sma3s 76 . The completeness and contamination of the SMRT assembled genome of strain ON4 T were calculated with CheckM 77 .
For the draft genome of strain ON4 (-) a DNA library was prepared from one nanogram of high-quality genomic DNA with the Nextera XT DNA Sample Preparation Kit (Illumina, San Diego, USA), which was sequenced using paired-end (PE) 2 × 300 bp on the MiSeq ® Illumina ® platform at Genoinseq (Cantanhede, Portugal). All procedures were performed according to the standard manufacturer's protocols. Following demultiplex and quality-filter procedures, the high quality and adapter free reads were assembled with SPAdes, version 3.7.1 78 , using in-house defined parameters.
Confirmation of the molA regulatory region. The 5′-Rapid Amplification of cDNA Ends (5′ RACE) was performed to identify the molA gene regulatory region using RNA extracted from strain ON4 T grown in MMM. For a detailed description see the Supplementary Material.
Comparison of different assemblies. The assemblies obtained from all the G. molinativorax genome sequencing projects (PRJNA245014 present study and PRJNA188907) were compared using MAUVE software 79 and the RAST server 80 . In addition, the raw reads of genomes ON4 T 454 (SRR6825088) and ON4 (−) (SRR6825089) were also mapped against the contigs obtained in SMRT analysis software (genome ON4 T PacBio ) (https:// www. pacb. com/ produ cts-and-servi ces/ analy tical-softw are/ smrt-analy sis/), using the BWA-MEM algorithm 81

Growth conditions, RNA extraction and sequencing of cDNA libraries. Given the low biomass
yield of strain ON4 T in mineral medium supplemented with a single carbon source other than molinate, strain ON4 T was grown up to the exponential phase in MMM (ON4 T MMM ) and Luria-Bertani Broth (ON4 T LB ) in 250 mL Erlenmeyer flasks to obtain mRNA libraries. Total RNA was extracted using the RNeasy extraction kit (QIAGEN GmbH) (detailed protocol in the Supplementary Material). The RNA quality was assessed using the RNA 6000 Pico Kit with the Prokaryote Total RNA Pico assay (Agilent 2100 Agilent Technologies, Santa Clara, CA, USA).
The subtractive hybridization protocol described by Stewart et al. 85 was used (detailed protocol in the Supplementary Material). The resulting rRNA-subtracted products were used to build the cDNA libraries for strandspecific RNA sequencing. The library preparation followed the protocol for whole transcription libraries in the Ion total RNA-Seq kit v2 (Life Technologies), using 500 ng of rRNA-subtracted products, as suggested by the manufacturer. Sequences were obtained in the Ion Proton™ platform at Genoinseq (Cantanhede, Portugal).
Normalization of the data from mRNA libraries. Ion Proton adapter sequences and low-quality bases were trimmed using the Torrent Suite software (Life Technologies). Trimmed reads were aligned to the strain ON4 T genome using TMAP version 4.0.6 (Life Technologies). Reads that mapped to the rRNA regions were excluded. The reads count was accessed using eXpress 86   The potential reference genes were selected among those without differences in expression between ON4 T MMM and ON4 T LB . Three reference genes (gyrA, gyrB, recA) were selected, and primers were designed using Oligo Explorer (http:// www. genel ink. com/ tools/ gl-oe. asp). For each reference gene, the primer pair specificity was assessed in silico, followed by qPCR amplification efficiency checking (Tables S10, S11) using different dilutions of DNA amplicons, and the amplification specificity was assessed using melting curves. The target genes were chosen considering differences in the expression values (|Log2FC|> 2) between ON4 T MMM and ON4 T LB . The primer choice and design and qPCR amplification efficiency and specificity (Tables S10, S11) for target genes were performed as described for reference genes (see Supplementary Material for detailed protocol).
RT-qPCR was performed in 96-well plates (Thermo Fischer Scientific) on the StepOnePlus system (Applied Biosystems). All RT-qPCR experiments were performed using three biological replicates, two technical replicates, and negative control (without cDNA). For detailed protocol see Supplementary Material. Evaluation and validation of reference gene expression stability and fold changes of target genes. The candidate reference genes stability was analysed using RefFinder (https:// omict ools. com/ reffi nder-tool), where a ranking of genes stability is obtained considering the data obtained from geNorm, Nor-mFinder, BestKeeper and deltaCt method [88][89][90][91] . The relative expression profile of each evaluated target gene was determined and normalized using a geometric mean value obtained from the amplification of the three best reference genes in all the samples (ON4 T MMM and ON4 T LB , respectively). Relative fold changes in gene expression were calculated using the Pfaffl method 92 , and data are displayed as mean ± standard error (SE).
Plasmid construction and cloning. All bacterial strains and plasmids used in the conjugants construction are listed in Table S12. The full sequence of the target genes (chnC and hyuA/hyuB) was amplified using total DNA of strain ON4 T as a template and specific primers with a NdeI restriction site (Table S13). An A-tailing was added to the purified PCR products and further inserted into pTZ57R/T vector. Escherichia coli JM109 was transformed with pTZ57R/T with each insert and recombinants were identified by blue/white colour selection and confirmed by colony PCR (Table S14). Recombinant pTZ57R/T vectors were extracted and further digested with the respective Speedy Restriction Enzymes (NZYTech), according to manufacturer's instructions (Table S13). The pET-30b(+) vector was restricted according to each target. After ligation, E. coli JM109 was transformed with the pET-30b(+) alone or carrying the target genes, and positive clones were selected from LB agar plates containing kanamycin (30 µg/mL) and screened by colony PCR (Table S14) www.nature.com/scientificreports/ of phosphate buffer (1:1 ratio). The resting cells were incubated at 30 ºC for 24 h. E. coli BL21(DE3) with the pET-30b(+) was used as biotic control. Caprolactam was quantified in cell-free supernatants, collected at 0 and 24 h of incubation, by High Performance Liquid Chromatograph (HPLC, VWR Hitachi Chromaster), following the method previously described 93 . See Supplementary Material for detailed protocol.

Data availability
The datasets generated and analysed during the current study are available in the respective databases. The genomes obtained were deposited in GenBank under the accession numbers PXVE00000000 (genome ON4 T 454 ), CP028427 and CP028426 (genome ON4 T PacBio ) and PXVD00000000 (genome ON4 (−) ). The ISGmo1 sequence was deposited in the ISfinder database (https:// www-is. bioto ul. fr/). The composite transposon carrying molA was named Tn6311, according to the Tn number registry (http:// www. ucl. ac. uk/ eastm an/ resea rch/ depar tments/ micro bial-disea ses/ tn). The data from the cDNA libraries obtained in this work have been deposited in the NCBI Gene Expression Omnibus (GEO) under accession numbers GSM3307305 and GSM3307306 for ON4 T MMM and ON4 T LB data, respectively. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.